Introduction

This document contains the documentation of the data analysis process for the component T01 from the company “201”. The aim of the analysis was to identify all defective components and which engines and cars they are part of. The data analysis process includes the import of the data, the cleaning up of the data, the transformation of the data to obtain the relevant information and the development of a shiny app to communicate the results.

The following packages were used to perform the data analysis.

#Add the packages which are used during the analysis
install.packages("tidyverse")
install.packages("lubridate")
install.packages("rmarkdown")
install.packages("knitr")
install.packages("shiny")
install.packages("shinyWidgets")
install.packages("leaflet")
install.packages("leaflet.extras")
library(tidyverse)
library(lubridate)
library(rmarkdown)
library(knitr)
library(shiny)
library(shinyWidgets)
library(leaflet)
library(leaflet.extras)

Import Data

The task was to identify all engines and cars which contains a defective component and to inform the affected customers and OEMs about it. Based on the task and the information given it was decided which datasets are relevant for the data analysis. The component “T01” is a standard component of the car brands “OEM1” and “OEM2”. Because of that every car of those two manufacturers and every engine which they use contains the component. Which engines are a component of which car brand can be found in the following table:

Engine of the cars
Engine_OEM1 Engine_OEM2
Gasoline_Engines_K1BE1 Gasoline_Engines_K1BE1
Diesel_Engine_K1DI1 Diesel_Engine_K1DI1

Based on this information the relevant component, engine and car data frames were imported. The relevant data includes the component T01, the cars from OEM1 and OEM2 and the engines inside those cars. To connect the datasets of the cars, engines and components, the datasets describing the relationship between these parts were also imported. Furthermore, we also need the locations where the cars were registered. The task is to inform the customer about defective products. Based on that the datasets related to the registration and the geographical data were imported. The geodata was imported specifically for the visualization of the data later on. In the following table, you can find all the data frames imported for the analysis:

Imported Data
component, Engine, Cars Relation Tables Location Related
Einzelteil_T01.txt Bestandteile_Komponente_K1BE1.csv Geodaten_Gemeinden_v1.2_2017-08-22_TrR.csv
Fahrzeuge_OEM1_Typ11.csv Bestandteile_Komponente_K1BE2.csv Zulassungen_alle_Fahrzeuge.csv
Fahrzeuge_OEM1_Typ12.csv Bestandteile_Komponente_K1DI1.csv
Fahrzeuge_OEM2_Typ21.csv Bestandteile_Komponente_K1DI2.csv
Fahrzeuge_OEM2_Typ22.csv Bestandteile_Komponente_OEM1_Typ11.csv
Komponente_K1BE1.csv Bestandteile_Komponente_OEM1_Typ12.csv
Komponente_K1BE2.csv Bestandteile_Komponente_OEM2_Typ21.csv
Komponente_K1DI1.csv Bestandteile_Komponente_OEM2_Typ22.csv
Komponente_K1DI2.txt

How to import the data?

The files were imported with functions from the readr - package. Depending on the delimiter the CSV-files were either imported with the function read_csv() (delimiter: “,”) or read_csv2() (delimiter: “;”). An exception is the datasets for the engines K1DI1 and K1DI2 and the component T01. They were imported with the functions of the Utils - package. Importing them with the functions of the readr package would have lead to parsing errors because the data was split into three different parts of the table (T01, K1DI1) or the name of the columns would have been wrong (K1DI2). They were turned to the class tibble to have a consistent class for all the datasets. Also for the import of the datasets of T01 and the engine K1DI2 some extra steps needed to be done. This steps can be found below the following code.

#Import of all relevant files related to components, engines and cars
oem1_11 <- read_csv("Data/Fahrzeug/Fahrzeuge_OEM1_Typ11.csv")
oem1_12 <- read_csv2("Data/Fahrzeug/Fahrzeuge_OEM1_Typ12.csv")
oem2_21 <- read_csv("Data/Fahrzeug/Fahrzeuge_OEM2_Typ21.csv")
oem2_22 <- read_csv2("Data/Fahrzeug/Fahrzeuge_OEM2_Typ22.csv")

k1be1 <- read_csv("Data/Komponente/Komponente_K1BE1.csv")
k1be2 <- read_csv2("Data/Komponente/Komponente_K1BE2.csv")

#Importing with the utils package because with the readr package we would have a parsing error and a lot of data is shown as NA even though there is data. The parsing error happens because the data is split in different columns
k1di1 <- as.tibble(read.csv("Data/Komponente/Komponente_K1DI1.csv", stringsAsFactors = FALSE))
  
#Import of the relation tables
relation_oem1_11 <- read_csv2("Data/Fahrzeug/Bestandteile_Fahrzeuge_OEM1_Typ11.csv")
relation_oem1_12 <- read_csv2("Data/Fahrzeug/Bestandteile_Fahrzeuge_OEM1_Typ12.csv")
relation_oem2_21 <- read_csv2("Data/Fahrzeug/Bestandteile_Fahrzeuge_OEM2_Typ21.csv")
relation_oem2_22 <- read_csv2("Data/Fahrzeug/Bestandteile_Fahrzeuge_OEM2_Typ22.csv")
  
relation_k1be1 <- read_csv2("Data/Komponente/Bestandteile_Komponente_K1BE1.csv")
relation_k1be2 <- read_csv2("Data/Komponente/Bestandteile_Komponente_K1BE2.csv")
relation_k1di1 <- read_csv2("Data/Komponente/Bestandteile_Komponente_K1DI1.csv")
relation_k1di2 <- read_csv2("Data/Komponente/Bestandteile_Komponente_K1DI2.csv")
  
#Import of the location related files
zulassung <- read_csv2("Data/Zulassungen/Zulassungen_alle_Fahrzeuge.csv")
geodata <- read_csv2("Data/Geodaten/Geodaten_Gemeinden_v1.2_2017-08-22_TrR.csv")

The files “Einzelteil_T01.txt” and “Komponente_K1DI2.txt” couldn’t be imported immediately with the above-described functions. The delimiter in both files was not common, which made the immediate import not possible. To overcome this problem the files were first read into R as a string. Afterward, the delimiter for a new line and the delimiter for a new cell were replaced with more common delimiter. Afterward, the file was saved as a CSV-document and imported with the corresponding function. The code for both can be seen below:

#Import the component K1DI2
#Read the file as String
k1di2_string <- read_file("Data/Komponente/Komponente_K1DI2.txt")
  
#replace the delimter with more common delimiter and save the document as csv-file in the directory
k1di2_csv <- gsub("\\\\", ";", k1di2_string)
k1di2_csv <- paste("\"\"," , k1di2_csv, sep = "")
k1di2_csv <- gsub("\t", "\n", k1di2_csv)
write(k1di2_csv, file = "Data/Komponente/Komponente_K1DI2.csv")
  
#import the saved csv-file
k1di2 <- as.tibble(read.csv2("Data/Komponente/Komponente_K1DI2.csv", stringsAsFactors = FALSE))
k1di2_string <- ""
k1di2_csv <- ""
gc()
#Import the component T01
#Read the file as String
t01_string <- read_file("Data/Einzelteil/Einzelteil_T01.txt")
  
#replace the delimter with more common delimiter and save the document as csv-file in the directory
t01_csv <- gsub(" \\| \\| ", ",", t01_string)
t01_csv <- paste("\"\"," , t01_csv, sep = "")
t01_csv <- gsub(" ", "\n", t01_csv)
write(t01_csv, file = "Data/Einzelteil/Einzelteil_T01.csv")
  
#import the saved csv-file
#Import it with the utils package because with the readr package we would have a parsing error and a lot of data is shown as NA even though there is data. The parsing error happens because the data is split on different columns
t01 <- as.tibble(read.csv("Data/Einzelteil/Einzelteil_T01.csv", stringsAsFactors = FALSE))
t01_string <- ""
gc()

Tidy Data

After importing the files the data frames needed to be clean up to have them in a form which is usable. The data frames should fulfill the following requirements:

To clean up the data, the name of the columns were fixed where needed, and the class of the columns was checked and changed where needed. Columns and rows not containing data were deleted. Also, columns were combined to showcase the data which was intended to be in the same column. In the following, you can see the details of what was done for each dataset. To get an overview of the dataset, the following commands were used:

You can find in the following a detailed overview of the clean up of the datasets and what was done. Even though not all the columns might be relevant for the shiny app, later on, everything was cleaned up. The idea behind it was, that in case we notice something which we thought was not relevant in the data will be later on relevant we don’t need to clean it up then.

Transform & Joining the Data

This part of the document will document the creation of the final dataset used to isolate and identify the relevant data for the shiny app. The datasets were reduced to the necessary data and then joined to create the final dataset.

Transform the relation tables

Because we are only interested in the relation between the t01 component, the engines, and the cars all columns which contain the ID of other components or engines were removed from the relation tables. For the relation table for the engines and the T01 component, just the ID for both remained. The same was done for the relation tables between cars and engines. In this case, the ID number for cars and engines remained in the dataset.

#transform the relation tables
relation_k1be1_analysis <- select(relation_k1be1, ID_Motor, ID_T1)
relation_k1be2_analysis <- select(relation_k1be2, ID_Motor, ID_T1)
relation_k1di1_analysis <- select(relation_k1di1, ID_Motor, ID_T1)
relation_k1di2_analysis <- select(relation_k1di2, ID_Motor, ID_T1)

relation_oem1_11_analysis <- relation_oem1_11[, -1:-4]
relation_oem1_12_analysis <- relation_oem1_12[, -1:-4]
relation_oem2_21_analysis <- relation_oem2_21[, -1:-4]
relation_oem2_22_analysis <- relation_oem2_22[, -1:-4]

Transform and join the T01, Engine, Car, Relation Tables

Afterward, the separated datasets of the engines, cars and the respective relation tables were combined to reduce the work for joining the datasets together later on.

engines <- rbind(k1be1_clean, k1be2_clean,k1di1_clean, k1di2_clean)
spare_engine <- rbind(relation_k1be1_analysis, relation_k1be2_analysis,relation_k1di1_analysis, relation_k1di2_analysis)
engine_car <- rbind(relation_oem1_11_analysis, relation_oem1_12_analysis, relation_oem2_21_analysis, relation_oem2_22_analysis)
cars <- rbind(oem1_11_clean, oem1_12_clean, oem2_21_clean, oem2_22_clean)

In the next step, the datasets were reduced to the relevant information. First of all the basics of the analysis are the defective T01 components of the company 201. The T01 data set was filtered regarding these two bits of information. For the datasets of the engines and the cars, we decided to keep only the column with the ID, which is necessary to join the datasets and the manufacturer number. This column is necessary for the shiny app, later on, to clearly attribute the defective cars to the respective engine manufacturers. The same was done for the dataset containing the cars. The columns showing if a car or an engine is defective were removed. When a component is defective also the component containing it is defective. The same goes for the relation between the component and the car. Concluding that a car containing a defective component is also defective. Based on that the information about the defective components is enough to identify the defective engines and cars. Also, the rest of the removed columns don’t give us relevant information regarding the task to isolate the defective cars and inform the car manufacturer and customers about them.

#filter for the defective spar parts produced by company 201 and select the relevant informatiosnipn to inform the customers and OEMs
t01_analysis <- filter(t01_clean, Hersteller_T1 == 201 & Fehlerhaft_T1 == 1) %>%
  select(ID_T1, Hersteller_T1, Fehlerhaft_T1)

#remove all columns except of the ID and the manufacturer
engines_analysis <- select(engines, ID_Motor, Herstellernummer_Motor)
cars_analysis <- select(cars, ID_Fahrzeug, Herstellernummer_Fahrzeug)

Joining the Data and reduce them to the needed information

The dataset which will be used was joined with the dataset for the component T01 as the basis. The following datasets were combined in the following order and by the following columns:

  1. The dataset containing the defective T01 and the relation tables between the component and components were joined by the column “ID_Motor”.
  2. The next dataset which was joined was the dataset of the engines also by the column “ID_Motor”.
  3. Afterward, the relation tables between the cars and components were joined by the column “ID_Fahrzeug”.
  4. The next dataset which was joined was the cars dataset by the column “ID_Fahrzeug”.
  5. The next dataset which was joined afterward contained the information regarding the registration. The dataset was joined by the column “ID_Fahrzeug”.
  6. Last but not least the dataset containing the Geodata was joined by the column “Ort”.

After joining the dataset together we filtered for all the defective cars which are also registered. We also removed the column “Fehlerhaft_T1” showing that a component is defective. The column was not needed anymore because all the components which were filtered until this point were defective.

#Joining the data with the defective T01 components as basis and filter for the cars with defective component which are registered
final <- left_join(t01_analysis, spare_engine, by  = "ID_T1") %>%
  left_join(., engines_analysis, by = "ID_Motor") %>%
  left_join(., engine_car, by = "ID_Motor") %>%
  left_join(., cars_analysis, by = "ID_Fahrzeug") %>%
  left_join(., zulassung_clean, by = "ID_Fahrzeug") %>%
  left_join(., geodata_clean, by = c("Ort")) %>%
  filter(!is.na(Zulassung))

#remove the column show casing that a T1 component is defect and the manufacturer of T1. It is not needed for the anaylisis later on
final <- subset(final, select = - c(Fehlerhaft_T1, Hersteller_T1) )

The final dataset has this structure:

Final Dataset Part 1
ID_T1 ID_Motor Herstellernummer_Motor ID_Fahrzeug Herstellernummer_Fahrzeug
1-201-2011-363 K1BE1-101-1011-11 101 12-1-12-4 1
1-201-2011-72 K1BE1-101-1011-55 101 12-1-12-6 1
1-201-2011-87 K1BE1-101-1011-97 101 11-1-11-4 1
1-201-2011-437 K1BE1-101-1011-44 101 11-1-11-11 1
1-201-2011-216 K1BE1-101-1011-75 101 11-1-11-28 1
1-201-2011-343 K1BE1-101-1011-13 101 12-1-12-12 1
1-201-2011-316 K1BE1-104-1041-195 104 12-1-12-34 1
Final Dataset Part 2
Ort Zulassung Postleitzahl Laengengrad Breitengrad
DORTMUND 2009-01-01 44135 7.464165 51.51417
DORTMUND 2009-01-01 44135 7.464165 51.51417
KOELN 2009-01-01 50667 6.957068 50.93811
RADEBEUL 2009-01-02 1445 13.659342 51.10598
RIESA 2009-01-02 1589 13.304633 51.30251
NUENCHRITZ 2009-01-02 1612 13.383291 51.30075
GLASHUETTE 2009-01-02 1768 13.781215 50.85193

With the final dataset we could identify the critical production amount. The amount of cars containing the defective spare part of company 201 is:

Amount of Defective Cars
defective_cars
80400

Shiny App

Shiny is a technology from the R environment that transforms analyses into interactive web applications that can be disseminated online on a website or in the company intranet, for example. The development of the application takes place completely in R, no HTML or JavaScript knowledge is required. The deployment of Shiny applications takes place via the Shiny Server, which is available both as open source and enterprise version. The server manages the provision of apps and can be accessed via intranet and internet. One part describes the structure of the application and is usually abbreviated as UI. This section of the code defines the entire layout of the application with all possible interactions, i.e. widgets. The second component is the server: Here, all outputs created in the UI are defined with R code, which can access all input values. The application is saved as one file (app.R) or as two files (ui.R and server.R).

Each user interface starts with the main function, which defines the website to be built for the application. This is a kind of basic framework for each application, which defines different options As soon as the user interface is defined, all outputs created in this way can be defined in the server script using conventional R code. Conceptually, the server section of an application is a function whose parameters are all possible inputs and the labels of the outputs created.

R-Studio then takes over the rendering of the front-end and you get a common HTML file, in whose backend R runs. The advantages of simplicity has, of course, its drawbacks. Shiny is rather limited in design and the placement of inputs like sliders, drop-downs or outputs like graphics or tables is very restricted.

With the dataset from before a shiny app was developed to communicate the results of the data. Two different plots were developed. The first one shows the development of the registration of the defective and registered cars over time in different areas. The second one is a heatmap which shows the concentration of the registered cars containing a defective T01 component.

Registration over the time

The task was to isolate the defective cars and inform the manufacturer and customer about it. Based on that we developed a plot which shows the development of the registration over time. The registration is shown accumulated over the month to give a better overview of the development. The data would have been harder to analyze if the development of the registration would have been shown over each day. Furthermore, the plot shows the proportion of registered and defective cars/engines per manufacturer. This is important for the manufacturer using this plot to identify which defective cars belongs to their responsibility.

It was decided to show the data over with a bar plot because it is also easier to separate the data by the manufacturers. A line plot would have made the observation of how many defective and registered cars belong to which manufacturer harder. At the same time, it is also possible to evaluate the development in a bar plot.

The plot can be filtered in the following aspects with the following reason:

  • Car manufacturer or Engine manufacturer: It is relevant which kind of manufacturer is looking at the data. If a car manufacturer is analyzing the plot and wants to see how many defective cars were registered, he is not interested which engine from which engine manufacturer is in the car but which cars belong to him. The same goes when an engine manufacturer is looking at the data.
  • Time Period: In case there is a relevant time period to watch. For example, if it is identified that the problem of the T01 component started at a certain time it is easier for the person to just watch on the data from that period.
  • Location: Some locations are more important to observe, especially the ones with the highest concentration of defective and registered cars. In combination with the heatmap, the manufacturer can just watch on the data of the location which have the highest amount of defective cars and with that are more important to be analyzed.
  • manufacturer Number: If one manufacturer is looking on the plot he is more interested in the cars which belong to him. The data which shows which cars belong to another manufacturer might be irrelevant for him because these cars are not in his responsibility. With that option, he can just find the registered cars belonging to him.

Concentration of the defective and registered cars

The second part of the shiny app task involved visualizing the distribution of cars containing defective parts in such a way as to allow the visual identification of the geographical areas most intensely affected by defects. To implement such a visualization, the leaflet package was chosen, as it provides a free-to-use and relatively easily implementable tools to visualize Data point on a map. Like the Graph depicting registrations over time, the leaflet widget allows the user to filter the results by car or engine manufacturer and by time period. The only difference in these filtering parameters is that the two filtering options can overlap in this context, allowing for more precise filtering results.

Additionally, the map offers two distinct ways of visualizing the distribution of the cars:

  • Heatmap: this feature of the shiny package allows to visualize the spatial distribution of a number of points on a map. Here it is used to display where there is a particularly high density of registrations of defective cars. By increasing the minimum number of registered defective cars, it is possible to focus on the metropolitan areas most affected. By decreasing the number, it is possible to visualize which larger regions are more affected.

  • Marker Clusters: with the addMarker functions, it is possible to map all instances of defective cars to the exact location of their registration. To help keep the view of the defective cars more manageable, the clusterOptions property of the addMarkers groups the single markers into large clusters, that adjust their size when the user zooms in on the map.

  • Pop-Ups another feature of the Leaflet Package is the possibility to add popup windows to the single marker. In this implementation, each popup window displays relevant information about the car. when multiple cars were registered in one location, they are displayed in a “spider view”. In cases when the number of cars registered is very large (such as, for instance, in Cologne), it is recommendable to use the Data table to explore the data.

Display of the Data in a table

The last part of the task consisted in making the whole dataset available to allow the verification of the visualized data. In order to achieve this, the dataset is being displayed using the renderDataTable function. Some columns are removed from the dataset before this, as they do not offer additional information to the user. The user has the option to either apply the same filters already used in the map view or to visualize the dataset as-is.

Usage example

The following examples shall serve to illustrate the usage of the Shiny app.

scenario 1:

Car manufacturer 2 wants to find out if an increase in Warranty claims in Cologne in the summer of 2009 corresponds to an increase of registered cars with faulty components in that location.

procedure:

  1. Set the date interval to display numbers between 1.6.2009 and 30.9.2009

  1. Choose car manufacturer 2

  1. Choose location “KOELN”

  1. Compare distribution to the distribution of warranty cases

scenario 2:

Engine manufacturer 101 wants to find out which location had the highest density of defective cars featuring their engine over the whole observed period.

procedure:

  1. Select the display mode “Heatmap”

  1. Set the filter for engine manufacturer and select manufacturer 101

  1. Zoom in on the area with the highest density (In this case the area around Düsseldorf) and set display mode to “Heatmap with Marker Clusters”

  1. Click on the cluster with the highest number, until markers are displayed. Clicking on the single markers will reveal information about one car, the motor in it, and the T1 component therein. (in this case there were 617 cars registered at the exact same location in Cologne, so the marker display could be too cluttered to be useful for further anlysis. Nevertheless, the approximate location with the highest density was found).

scenario 3:

While investigating the causes of an accident, insurance investigators want to find out if a car with the engine ID “K1BE2-104-1041-32050” was amongst the cars affected by the faulty component T1.

  1. Display unfiltered dataset, so the filters applied on the map don’t affect the search

  1. Write the ID into the search bar of the Data Table (in this case it can be confirmed that this engine was not in one of the affected cars)

The Shiny App

the code for the shiny app can be seen as follows

#install.packages("leaflet")
#install.packages("leaflet.extras")

library(shiny)
library(tidyverse)
library(shinyWidgets)
library(leaflet)
library(leaflet.extras)

#loading the final dataset resulting from the filtering and ordering
load("../Finaler_Datensatz_25.RData")

#uncomment the following line to load dataset faster
#final <- head(final, 500)


#create a data frame jsut containing the locations
places <- group_by(final, Ort) %>%
  summarise(n=n())

# Define UI for application that draws a histogram
ui <- fluidPage(
  
  # Application title
  titlePanel("Defective & Registered Cars"),
  
  
  # Show a plot of the generated distribution
  mainPanel(width = "100%",
            
            #input modules based on the date, location and car manufacturer
            wellPanel(width = "90%",
                      titlePanel("Filters for Displaying Registrations over Time"),
                      fluidRow(
                        column(width = 6, 
                               
                               dateRangeInput("registration", 
                                              label = "Choose the time period.",
                                              start = min(final$Zulassung),
                                              end = max(final$Zulassung),
                                              min = min(final$Zulassung),
                                              max = max(final$Zulassung),
                                              startview = "year",
                                              format = "yyyy-mm-dd"),
                               
                               radioButtons("manufacturer", 
                                            label = "Showcase the data by", 
                                            choices = c("Car manufacturer", "Engine manufacturer"),
                                            select = "Car manufacturer")
                        ),
                        
                        column(width= 6,
                               
                               pickerInput("location",
                                           label = "Choose the location to observe",
                                           choices =  c("All", places$Ort),
                                           selected = "All"),
                               
                               conditionalPanel("input.manufacturer == 'Car manufacturer'",
                                                
                                                selectInput("car_manufacturer", 
                                                            label = "Choose the Car manufacturer",
                                                            choices = c("All", 1, 2),
                                                            selected = "All")
                               ),
                               
                               conditionalPanel("input.manufacturer == 'Engine manufacturer'",
                                                
                                                selectInput("engine_manufacturer", 
                                                            label = "Choose the Engine manufacturer",
                                                            choices = c("All", 101, 102, 103, 104),
                                                            selected = "All")
                               )
                        )
                      ),
                      
                      
                      # Output of the plot related to the registration over the time
                      plotOutput("registration",
                                 width = "100%"  
                      )
            ),  
            
            wellPanel(width = "90%",
                      titlePanel("Filters for Displaying Geographical distribution"),
                      fluidRow(
                        #the input given by the user for the heatmap
                        column(6,
                               dateRangeInput("heatmap", 
                                              label = "Choose the date you want to have information on for the heatmap",
                                              start = min(final$Zulassung),
                                              end = max(final$Zulassung),
                                              min = min(final$Zulassung),
                                              max = max(final$Zulassung)),
                               
                               radioButtons("manufacturer2", 
                                            label = "Showcase the data by", 
                                            choices = c("Car manufacturer", "Engine manufacturer"),
                                            select = "Car manufacturer")
                               
                               
                        ),
                        #the user chooses whether to display the density numerically though clusters, or graphically through a heatmap
                        column(6,
                               selectInput("mapDisplay",
                                           label = "Select display mode of map",
                                           choices = c("Marker Clusters", "Heatmap", "Heatmap with Marker Clusters"),
                                           selected = "Heatmap with Marker Clusters"
                               ),
                               conditionalPanel("input.mapDisplay == 'Heatmap' | input.mapDisplay == 'Heatmap with Marker Clusters'",
                                                
                                                numericInput("fehleranzahl",
                                                             label = "Show location with minimum amount of registered defective cars",
                                                             value = 20,
                                                             min = 1,
                                                             max = 10000,
                                                             step = 1)
                               ),
                               conditionalPanel("input.manufacturer2 == 'Car manufacturer'",
                                                
                                                selectInput("car_manufacturer2", 
                                                            label = "Choose the Car manufacturer",
                                                            choices = c("All", 1, 2),
                                                            selected = "All")
                               ),
                               
                               conditionalPanel("input.manufacturer2 == 'Engine manufacturer'",
                                                
                                                selectInput("engine_manufacturer2", 
                                                            label = "Choose the Engine manufacturer",
                                                            choices = c("All", 101, 102, 103, 104),
                                                            selected = "All")
                               )
                               
                            )
                        ),
                      
                      titlePanel(
                        h4("Geographical Distribution of Cars")
                      ),
                      titlePanel(
                        h6("Click or scroll on the clusters to zoom in. Click the markers to reveal information about the car.")
                      ),
                      mainPanel(width = "90%",
                                
                                # Show the distribution of registered cars on a map
                                leafletOutput("gerMap", 
                                              width = "90%",
                                              height = "650px")
                      )
            ),
            wellPanel(width = "90%",
                      titlePanel("Tabular Representation of Data"),
                      #user has the choice of using the table to further explore the map, or to search for items individually
                      selectInput("mapFilters", 
                                  label = "Use filters from Map Display?",
                                  choices = c("Use Map Filters", "Display Unfiltered Dataset"),
                                  selected = "Use Map Filters"),
                      mainPanel(width = "100%",
                                dataTableOutput("table")    
                      )
                      
            )
            
  )
)

# Define server logic 
server <- function(input, output) {
  
  
  #filter dataset for sorting cars by car manufacturer
  registration <- reactive({
    
    count <- filter(final, Zulassung >= input$registration[1] & Zulassung <= input$registration[2])
    if (input$location != "All") {
      count <- filter(count, Ort == input$location)
    }
    if (input$car_manufacturer != "All") {
      count <- filter(count, Herstellernummer_Fahrzeug == input$car_manufacturer)
    }
    count$Zulassung <- format.Date(count$Zulassung, format = "%Y-%m")
    count$Zulassung <- as.character(count$Zulassung)
    count %>% group_by(Zulassung, Herstellernummer_Fahrzeug) %>%
      summarise(n = n())
    
  })
  #filter dataset for sorting cars by engine manufacturer
  registration_engine <- reactive({
    count <- filter(final, Zulassung >= input$registration[1] & Zulassung <= input$registration[2])
    if (input$location != "All") {
      count <- filter(count, Ort == input$location)
    }
    if (input$engine_manufacturer != "All") {
      count <- filter(count, Herstellernummer_Motor == input$engine_manufacturer)
    }
    count$Zulassung <- format.Date(count$Zulassung, format = "%Y-%m")
    count$Zulassung <- as.character(count$Zulassung)
    count %>% group_by(Zulassung, Herstellernummer_Motor) %>%
      summarise(n = n())
  })
  
  
  output$registration <- renderPlot({
    if (input$manufacturer == "Car manufacturer"){
      #create the plot based on the reactive registration() as bar plot
      p <- ggplot(data = registration(), aes(x = Zulassung, y = n, fill=factor(Herstellernummer_Fahrzeug))) +
        geom_bar(stat = "identity")
      #change the colours to shwocase the differen car manufacturer
      p <- p +scale_fill_manual(breaks = c("1", "2"),
                                values=c("#0B3B17", "#ff7f24"))
      #name axis, plot and legend
      p <- p + guides(fill = guide_legend(title="Car manufacturer"))
      p <- p + labs(x = "Date Range",
                    y = "Amount of Defective Cars registered",
                    title = "Registration of defective cars")
    } else {
      
      # create the plot based on the reactive registration_engine() as bar plot
      p <- ggplot(data = registration_engine(), aes(x = Zulassung, y = n, fill=factor(Herstellernummer_Motor))) +
        geom_bar(stat = "identity")
      # change the colours to shwocase the differen car manufacturer
      p <- p +scale_fill_manual(breaks = c("101", "102", "103", "104"),
                                values=c("#0B3B17", "#ff7f24", "#BEBEBE", "#a52a2a"))
      #name axis, plot and legend
      p <- p + guides(fill = guide_legend(title="Engine manufacturer"))
      p <- p + labs(x = "Date Range",
                    y = "Amount of Defective Cars registered",
                    title = "Registration of defective cars by Engine manufacturers")
      p
    }
    
    #format the plot in a nice format
    p <- p + theme(legend.position ="bottom", 
                   plot.title=element_text(size = 18, face="bold"), 
                   axis.title = element_text(size = 12),
                   legend.title.align = 0,
                   legend.direction = "horizontal",
                   legend.text = element_text(size = 12),
                   legend.title = element_text(size = 12, face = "bold"),
                   panel.background = element_rect(fill = "white", colour = "black"),
                   panel.grid.major.y = element_line(colour = "black", linetype = "solid"),
                   panel.grid.minor.y = element_line(colour = "black", linetype = "dotted"))
    p
    
  })
  
  filteredCars <- reactive ({
    #apply the filters as chosen by the user
    cars <- filter(final, Zulassung >= input$heatmap[1] & Zulassung <= input$heatmap[2])
    if (input$car_manufacturer2 != "All") {
      cars <- filter(cars, Herstellernummer_Fahrzeug == input$car_manufacturer2)
    }
    if (input$engine_manufacturer2 != "All") {
      cars <- filter(cars, Herstellernummer_Motor == input$engine_manufacturer2)
    }
    cars
  }
  )
  
  
  
  #summarise locations to reflect number of defective cars per location
  heatmap <- reactive ({
    if (input$mapDisplay == 'Marker Clusters'  ) {
      #ensure that the column names stay the same by taking the first row of the final dataset...
      heat <- head(final,1)
      #adding an empty "fehleranzahl" column...
      heat$fehleranzahl <- NA
      #setting output to an empty df, so that nothing can be displayed on the heatmap
      heat <- filter(heat, FALSE)
    } else {
      heat <- filteredCars() %>%
        group_by(Ort, Laengengrad, Breitengrad) %>%
        summarise(fehleranzahl = n()) %>%
        #filter by number of registered cars
        filter(fehleranzahl >= input$fehleranzahl)
    }
    heat
  })
  
  markers <- reactive ({
    if(input$mapDisplay == 'Heatmap') {
      #if the markers are not supposed to be displayed, this returns an empty data frame that still has the same column names
      markers <- filter(final, FALSE) 
    } else {
      markers <- filteredCars()
    }
    markers
  })
  #prepare a dataset to be displayed as an overview in the table
  browseTable <- reactive({
    if (input$mapFilters ==  "Use Map Filters"){
      table <- filteredCars()
    } else {
      table <- final
    }
    #removing unneeded information:
    #the gps coordinates are not really relevant to humans
    #the car and motor manufacturer numbers are already represented in in ID_Fahrzeug and ID_Motor respectively
    table %>% select(-Laengengrad,-Breitengrad,-Herstellernummer_Fahrzeug,-Herstellernummer_Motor)
  })
  
  
  #using the leaflet package, we can visualise the data on a map
  output$gerMap <- renderLeaflet(
    #take the dataset of cars that results from the filters that the user chose.
    leaflet(data = filteredCars()) %>% 
      #adds a rendered map, by default openstreetmap
      addTiles() %>%
      #set the bounds of the map according to the coordinates of the locations
      fitBounds(min(final$Laengengrad),min(final$Breitengrad),max(final$Laengengrad),max(final$Breitengrad)) %>%
      #adds a heatmap representation of the density of registraiton of faulty cars
      addHeatmap(data = heatmap(), lng = ~Laengengrad, lat = ~Breitengrad, 
                 intensity = ~fehleranzahl, blur = 12, max = 10.0, radius = 8) %>%
      #postions one marker for each entry in the table
      addMarkers(data = markers(), ~Laengengrad, ~Breitengrad, 
                 #display large amounts of markers as clusters
                 clusterOptions = markerClusterOptions(), 
                 #display information about the car corresponding to the marker being clicked
                 popup = ~paste("Zugelassen in: <b> " ,Ort,"</b> <br/>",
                                "PLZ: ", Postleitzahl, "<br/>",
                                "Zulassung: ", Zulassung, "<br/>",
                                "ID_Fahrzeug: ", ID_Fahrzeug, "<br/>",
                                "ID_Motor: ", ID_Motor, "<br/>",
                                "ID_T1: ", ID_T1, "<br/>"
                 )
                 
      )
  )
  #display the dataset in a Data Table that the user can search, filter, and sort
  output$table <- renderDataTable(
    browseTable(),
    options = list(
      pageLength = 10
    )
  )
  
  
}

# Run the application 
shinyApp(ui = ui, server = server)